Data visualization

Variable assignment

Throughout the exercises in this chapter, you’ll be visualizing a subset of the gapminder data from the year 1952. First, you’ll have to load the ggplot2 package, and create a gapminder_1952 dataset to visualize.

# Load the knitr and kableExtra packages
library(knitr)
library(kableExtra)
options(knitr.table.format = "html")
# Load the gapminder package
library(gapminder)
# Load the dpylr package
library(dplyr)
# Load the ggplot2 package as well
library(ggplot2)
theme_set(theme_bw())  # pre-set the bw theme.
# Create gapminder_1952
gapminder_1952 <- gapminder %>%
    filter(year == 1952)
# Look at the gapminder_1952 dataset
gapminder_1952 %>%
  kable(caption = "Gapminder from 1952") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = T, position = "left", , font_size = 11) %>%
  row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
  scroll_box(width = "100%", height = "300px")
Gapminder from 1952
country continent year lifeExp pop gdpPercap
Afghanistan Asia 1952 28.801 8425333 779.4453
Albania Europe 1952 55.230 1282697 1601.0561
Algeria Africa 1952 43.077 9279525 2449.0082
Angola Africa 1952 30.015 4232095 3520.6103
Argentina Americas 1952 62.485 17876956 5911.3151
Australia Oceania 1952 69.120 8691212 10039.5956
Austria Europe 1952 66.800 6927772 6137.0765
Bahrain Asia 1952 50.939 120447 9867.0848
Bangladesh Asia 1952 37.484 46886859 684.2442
Belgium Europe 1952 68.000 8730405 8343.1051
Benin Africa 1952 38.223 1738315 1062.7522
Bolivia Americas 1952 40.414 2883315 2677.3263
Bosnia and Herzegovina Europe 1952 53.820 2791000 973.5332
Botswana Africa 1952 47.622 442308 851.2411
Brazil Americas 1952 50.917 56602560 2108.9444
Bulgaria Europe 1952 59.600 7274900 2444.2866
Burkina Faso Africa 1952 31.975 4469979 543.2552
Burundi Africa 1952 39.031 2445618 339.2965
Cambodia Asia 1952 39.417 4693836 368.4693
Cameroon Africa 1952 38.523 5009067 1172.6677
Canada Americas 1952 68.750 14785584 11367.1611
Central African Republic Africa 1952 35.463 1291695 1071.3107
Chad Africa 1952 38.092 2682462 1178.6659
Chile Americas 1952 54.745 6377619 3939.9788
China Asia 1952 44.000 556263527 400.4486
Colombia Americas 1952 50.643 12350771 2144.1151
Comoros Africa 1952 40.715 153936 1102.9909
Congo, Dem. Rep. Africa 1952 39.143 14100005 780.5423
Congo, Rep. Africa 1952 42.111 854885 2125.6214
Costa Rica Americas 1952 57.206 926317 2627.0095
Cote d’Ivoire Africa 1952 40.477 2977019 1388.5947
Croatia Europe 1952 61.210 3882229 3119.2365
Cuba Americas 1952 59.421 6007797 5586.5388
Czech Republic Europe 1952 66.870 9125183 6876.1403
Denmark Europe 1952 70.780 4334000 9692.3852
Djibouti Africa 1952 34.812 63149 2669.5295
Dominican Republic Americas 1952 45.928 2491346 1397.7171
Ecuador Americas 1952 48.357 3548753 3522.1107
Egypt Africa 1952 41.893 22223309 1418.8224
El Salvador Americas 1952 45.262 2042865 3048.3029
Equatorial Guinea Africa 1952 34.482 216964 375.6431
Eritrea Africa 1952 35.928 1438760 328.9406
Ethiopia Africa 1952 34.078 20860941 362.1463
Finland Europe 1952 66.550 4090500 6424.5191
France Europe 1952 67.410 42459667 7029.8093
Gabon Africa 1952 37.003 420702 4293.4765
Gambia Africa 1952 30.000 284320 485.2307
Germany Europe 1952 67.500 69145952 7144.1144
Ghana Africa 1952 43.149 5581001 911.2989
Greece Europe 1952 65.860 7733250 3530.6901
Guatemala Americas 1952 42.023 3146381 2428.2378
Guinea Africa 1952 33.609 2664249 510.1965
Guinea-Bissau Africa 1952 32.500 580653 299.8503
Haiti Americas 1952 37.579 3201488 1840.3669
Honduras Americas 1952 41.912 1517453 2194.9262
Hong Kong, China Asia 1952 60.960 2125900 3054.4212
Hungary Europe 1952 64.030 9504000 5263.6738
Iceland Europe 1952 72.490 147962 7267.6884
India Asia 1952 37.373 372000000 546.5657
Indonesia Asia 1952 37.468 82052000 749.6817
Iran Asia 1952 44.869 17272000 3035.3260
Iraq Asia 1952 45.320 5441766 4129.7661
Ireland Europe 1952 66.910 2952156 5210.2803
Israel Asia 1952 65.390 1620914 4086.5221
Italy Europe 1952 65.940 47666000 4931.4042
Jamaica Americas 1952 58.530 1426095 2898.5309
Japan Asia 1952 63.030 86459025 3216.9563
Jordan Asia 1952 43.158 607914 1546.9078
Kenya Africa 1952 42.270 6464046 853.5409
Korea, Dem. Rep. Asia 1952 50.056 8865488 1088.2778
Korea, Rep. Asia 1952 47.453 20947571 1030.5922
Kuwait Asia 1952 55.565 160000 108382.3529
Lebanon Asia 1952 55.928 1439529 4834.8041
Lesotho Africa 1952 42.138 748747 298.8462
Liberia Africa 1952 38.480 863308 575.5730
Libya Africa 1952 42.723 1019729 2387.5481
Madagascar Africa 1952 36.681 4762912 1443.0117
Malawi Africa 1952 36.256 2917802 369.1651
Malaysia Asia 1952 48.463 6748378 1831.1329
Mali Africa 1952 33.685 3838168 452.3370
Mauritania Africa 1952 40.543 1022556 743.1159
Mauritius Africa 1952 50.986 516556 1967.9557
Mexico Americas 1952 50.789 30144317 3478.1255
Mongolia Asia 1952 42.244 800663 786.5669
Montenegro Europe 1952 59.164 413834 2647.5856
Morocco Africa 1952 42.873 9939217 1688.2036
Mozambique Africa 1952 31.286 6446316 468.5260
Myanmar Asia 1952 36.319 20092996 331.0000
Namibia Africa 1952 41.725 485831 2423.7804
Nepal Asia 1952 36.157 9182536 545.8657
Netherlands Europe 1952 72.130 10381988 8941.5719
New Zealand Oceania 1952 69.390 1994794 10556.5757
Nicaragua Americas 1952 42.314 1165790 3112.3639
Niger Africa 1952 37.444 3379468 761.8794
Nigeria Africa 1952 36.324 33119096 1077.2819
Norway Europe 1952 72.670 3327728 10095.4217
Oman Asia 1952 37.578 507833 1828.2303
Pakistan Asia 1952 43.436 41346560 684.5971
Panama Americas 1952 55.191 940080 2480.3803
Paraguay Americas 1952 62.649 1555876 1952.3087
Peru Americas 1952 43.902 8025700 3758.5234
Philippines Asia 1952 47.752 22438691 1272.8810
Poland Europe 1952 61.310 25730551 4029.3297
Portugal Europe 1952 59.820 8526050 3068.3199
Puerto Rico Americas 1952 64.280 2227000 3081.9598
Reunion Africa 1952 52.724 257700 2718.8853
Romania Europe 1952 61.050 16630000 3144.6132
Rwanda Africa 1952 40.000 2534927 493.3239
Sao Tome and Principe Africa 1952 46.471 60011 879.5836
Saudi Arabia Asia 1952 39.875 4005677 6459.5548
Senegal Africa 1952 37.278 2755589 1450.3570
Serbia Europe 1952 57.996 6860147 3581.4594
Sierra Leone Africa 1952 30.331 2143249 879.7877
Singapore Asia 1952 60.396 1127000 2315.1382
Slovak Republic Europe 1952 64.360 3558137 5074.6591
Slovenia Europe 1952 65.570 1489518 4215.0417
Somalia Africa 1952 32.978 2526994 1135.7498
South Africa Africa 1952 45.009 14264935 4725.2955
Spain Europe 1952 64.940 28549870 3834.0347
Sri Lanka Asia 1952 57.593 7982342 1083.5320
Sudan Africa 1952 38.635 8504667 1615.9911
Swaziland Africa 1952 41.407 290243 1148.3766
Sweden Europe 1952 71.860 7124673 8527.8447
Switzerland Europe 1952 69.620 4815000 14734.2327
Syria Asia 1952 45.883 3661549 1643.4854
Taiwan Asia 1952 58.500 8550362 1206.9479
Tanzania Africa 1952 41.215 8322925 716.6501
Thailand Asia 1952 50.848 21289402 757.7974
Togo Africa 1952 38.596 1219113 859.8087
Trinidad and Tobago Americas 1952 59.100 662850 3023.2719
Tunisia Africa 1952 44.600 3647735 1468.4756
Turkey Europe 1952 43.585 22235677 1969.1010
Uganda Africa 1952 39.978 5824797 734.7535
United Kingdom Europe 1952 69.180 50430000 9979.5085
United States Americas 1952 68.440 157553000 13990.4821
Uruguay Americas 1952 66.071 2252965 5716.7667
Venezuela Americas 1952 55.088 5439568 7689.7998
Vietnam Asia 1952 40.412 26246839 605.0665
West Bank and Gaza Asia 1952 43.160 1030585 1515.5923
Yemen, Rep. Asia 1952 32.548 4963829 781.7176
Zambia Africa 1952 42.038 2672000 1147.3888
Zimbabwe Africa 1952 48.451 3080907 406.8841

Comparing population and GDP per capita

In the video you learned to create a scatter plot with GDP per capita on the x-axis and life expectancy on the y-axis (the code for that graph is shown here). When you’re exploring data visually, you’ll often need to try different combinations of variables and aesthetics.

ggplot(gapminder_1952, aes(x = pop, y = gdpPercap)) +
  geom_point() + 
  geom_smooth(method="loess", se=F) +
  labs(subtitle="GDP by capita by population", 
       y="GDP per capita", 
       x="Population", 
       title="Scatterplot", 
       caption = "")

Each point represents a country: can you guess which country any of the points are?

Comparing population and life expectancy

In this exercise, you’ll use ggplot2 to create a scatter plot from scratch, to compare each country’s population with its life expectancy in the year 1952.

# Create a scatter plot with pop on the x-axis and lifeExp on the y-axis
ggplot(gapminder_1952, aes(x = pop, y = lifeExp)) +
  geom_point()+
  geom_smooth(method="loess", se=F) +
  labs(subtitle="Country's population with its life expectancy in the year 1952", 
       y="Life Expectancy", 
       x="Population", 
       title="Scatterplot", 
       caption = "")

You might notice the points are crowded towards the left side of the plot, making them hard to distinguish.

Putting the x-axis on a log scale

You previously created a scatter plot with population on the x-axis and life expectancy on the y-axis. Since population is spread over several orders of magnitude, with some countries having a much higher population than others, it’s a good idea to put the x-axis on a log scale.

# Change this plot to put the x-axis on a log scale
ggplot(gapminder_1952, aes(x = pop, y = lifeExp)) +
  geom_point()+
  scale_x_log10() +
  geom_smooth(method="loess", se=F) +
  labs(subtitle="Country's population (passed into log scale) with its life expectancy in the year 1952", 
       y="Life Expectancy", 
       x="Population", 
       title="Scatterplot", 
       caption = "")

Notice the points are more spread out on the x-axis. This makes it easy to see that there isn’t a correlation between population and life expectancy.

Putting the x- and y- axes on a log scale

Suppose you want to create a scatter plot with population on the x-axis and GDP per capita on the y-axis. Both population and GDP per-capita are better represented with log scales, since they vary over many orders of magnitude.

# Scatter plot comparing pop and gdpPercap, with both axes on a log scale
ggplot(gapminder_1952, aes(x = pop, y = gdpPercap)) +
  geom_point() +
  scale_x_log10() +
  scale_y_log10() + 
  geom_smooth(method="loess", se=F) +
  labs(subtitle="Country's population (log scale) with GDP by capita (log scale) in the year 1952", 
       y="GDP by capita", 
       x="Population", 
       title="Scatterplot", 
       caption = "")

Notice that the y-axis goes from 1e3 (1000) to 1e4 (10,000) to 1e5 (100,000) in equal increments.

Adding color to a scatter plot

In this lesson you learned how to use the color aesthetic (color and pop), which can be used to show which continent each point in a scatter plot represents.

# Scatter plot comparing pop and lifeExp, with color representing continent
ggplot(gapminder_1952, aes(x = pop, y = lifeExp, color = continent))+
  geom_point() +
  scale_x_log10() +
  labs(subtitle="Country's population (log scale) with Life expectancy in the year 1952", 
       y="Life expectancy", 
       x="Population", 
       title="Scatterplot colored by continent", 
       caption = "")

Adding size and color to a plot

In the last exercise, you created a scatter plot communicating information about each country’s population, life expectancy, and continent. Now you’ll use the size of the points to communicate even more.

# Add the size aesthetic to represent a country's gdpPercap
ggplot(gapminder_1952, aes(x = pop, y = lifeExp, color = continent, size = gdpPercap)) +
  geom_point() +
  scale_x_log10() +
  labs(subtitle="Country's population (log scale) with Life expectancy in the year 1952", 
       y="Life expectancy", 
       x="Population", 
       title="Scatterplot colored by continent, size by GDB by capita", 
       caption = "")

Creating a subgraph for each continent

You’ve learned to use faceting to divide a graph into subplots based on one of its variables, such as the continent.

# Scatter plot comparing pop and lifeExp, faceted by continent
ggplot(gapminder_1952, aes(x = pop, y =lifeExp)) +
  geom_point() +
  scale_x_log10() +
  facet_wrap(~ continent) +
  labs(subtitle="Country's population (log scale) with Life expectancy in the year 1952 by Continent", 
       y="Life expectancy", 
       x="Population", 
       title="Scatterplot of each continent", 
       caption = "")

Faceting is a powerful way to understand subsets of your data separately.

Faceting by year

All of the graphs in this chapter have been visualizing statistics within one year. Now that you’re able to use faceting, however, you can create a graph showing all the country-level data from 1952 to 2007, to understand how global statistics have changed over time.

# Scatter plot comparing gdpPercap and lifeExp, with color representing continent
# and size representing population, faceted by year
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) + 
  geom_point() +
  scale_x_log10() + 
  facet_wrap(~ year) +
  labs(subtitle="GDB per capita (log scale) with Life expectancy by Continent and size population", 
       y="Life expectancy", 
       x="GDP per Capita", 
       title="Scatterplot, every 5 years from 1952 to 2007", 
       caption = "")